I have been dealing with an interesting forking issue at work. It
happens to involve Perl, but don't let that put you off.
So, suppose you need to perform an I/O-bound task that is eminently
parallelizable (in our case, generating and sending lots of emails).
You have learnt from previous such attempts, and broken out
Parallel::Iterator
from CPAN to give you easy fork()ing goodness. Forking can be very
memory-efficient, at least under the Linux kernel, because pages are
shared between the parent and the children via a
copy-on-write
system.
Further suppose that you want to generate and share a large data
structure between the children, so that you can iterate over it.
Copy-on-write pages, should be cheap, right?
my $large_array_ref = get_data();
my $iter = iterate( sub
my $i = $_[1];
my $element = $large_array_ref->[$i];
...
, [0..1000000] );
Sadly, when you run your program, it gobbles up memory until the
OOM killer steps in.
Our first problem was that the system malloc implementation was
less good for this particular task than Perl's built-in malloc. Not a
problem, we were using
perlbrew
anyway, so a quick few experimental rebuilds later and this was
solved.
More interesting was the slow, 60MB/s leak that we saw after that.
There were no circular references, and everything was going out of
scope at the end of the function, so what was happening?
Recall that Perl uses reference counting to track memory
allocation. In the children, because we took a reference to an
element of the large shared data structure, we were effectively
writing to the relevant page in memory, so it would get copied. Over
time, as we iterated through the entire structure, the children would
end up copying almost every page! This would double our memory costs.
(We confirmed the diagnosis using 'smem', incidentally. Very
useful.)
The copy-on-write semantics of fork() do not play well with
reference-counted interpreted languages such as Perl or CPython.
Apparently a similar issue occurs with some mark-and-sweep
garbage-collection implementations - but
Ruby
2.0 is reputed to be COW-friendly.
All was not lost, however - we just needed to avoid taking any
references! Implement a deep copy that does not involve saving any
intermediate variables along the way. This can be a bit long-winded,
but it works.
my $large_array_ref = get_data();
my $iter = iterate( sub
my $i = $_[1];
my %clone;
$clone id = $large_array_ref->[$i] id ;
$clone foo = $large_array_ref->[$i] foo ;
...
, [0..1000000] );
This could be improved if we wrote an XS CPAN module that cloned
data structures without incrementing any reference counts - I presume
this is possible. We tried the most common deep-copy modules from
CPAN, but have not yet found one that avoids reference counting.
This same problem almost certainly shows up when using the Apache
prefork MPM and mod_perl - even read-only global variables can
become unshared.
I would be very interested to learn of any other approaches people
have found to solve this sort of problem - do email me.